Preliminary analyses for microeukaryote tag-sequence survey.
Description: Investigate the diversity of single-celled microbial eukaryotic communities across several deep-sea hydrothermal vent sites (including NE Pacific, Caribbean). We plan to address questions related to the environmental factors that shape protistan community dynamics, and determine if patterns in species diversity and distribution vary at different deep-sea habitats. These questions will be addressed using similarly generated metabarcoding data from several distinct hydrothermal vents. Along with characterizing community structure, we plan to evaluate interactions between protist species (to identify putative predator-prey or parasite-host relationships) and their environment (to explore their relationship to geochemical properties).
Questions to address
What is the general biogeography and distribution of the deep-sea hydrothermal vent microbial eukaryotic community?
What community structure features (i.e., species richness, proportion cosmopolitan versus endemic, species evenness) are shared across or unique to deep-sea hydrothermal vent sites?
What environmental features (i.e., temperature, geochemistry) influence microbial eukaryotic community diversity? Can we identify if certain environmental factors select for putative vent endemics?
Datasets included: - Gorda Ridge 2019 cruise - Axial Seamount time series - 2013, 2014, & 2015 - Mid-Cayman Rise 2020 cruise
All data generated from extracted RNA, reverse transcribed to cDNA and amplified with primers that target the V4 hypervariable region on the 18S rRNA gene.
Analysis done with QIIME2, kept 40-60% of the sequences through the QC process and generated Amplicon Sequence Variants (ASVs) with DADA2. Taxonomic assignment done with vsearch using the PR2 database (v4.14) at 80% identity. See the seq-analysis directory for QIIME2 code.
After determining ASVs for each sequence run, ASV tables were merged.
merged_tax <- read_delim("data-input/taxonomy.tsv", delim = "\t")
merged_asv <- read_delim("data-input/microeuk-merged-asv-table.tsv", delim = "\t", skip = 1)
# head(merged_tax)Still want to find more metadata. As of Oct 16, have temperature and prokaryote cell/ml if available, but can add in more metadata for the MCR cruise.
metadata <- read.delim("data-input/samplelist-metadata.txt")
# head(metadata)Remove samples from Gorda Ridge microcolonizers and from the FLP experiments (Gorda Ridge and Mid-Cayman Rise).
asv_wtax <- merged_asv %>%
select(FeatureID = '#OTU ID', everything()) %>%
pivot_longer(cols = !FeatureID,
names_to = "SAMPLE", values_to = "value") %>%
left_join(merged_tax, by = c("FeatureID" = "Feature ID")) %>%
left_join(metadata) %>%
filter(!grepl("Siders_", SAMPLE)) %>%
filter(SAMPLETYPE != "Incubation") %>%
filter(SAMPLETYPE != "Microcolonizer") %>%
mutate(DATASET = case_when(
grepl("_GR_", SAMPLE) ~ "GR",
grepl("Gorda", SAMPLE) ~ "GR",
grepl("_MCR_", SAMPLE) ~ "MCR",
grepl("Axial", SAMPLE) ~ "Axial",
TRUE ~ "Control or blank")) %>%
separate(Taxon, c("Domain", "Supergroup",
"Phylum", "Class", "Order",
"Family", "Genus", "Species"), sep = ";", remove = FALSE) %>%
unite(SAMPLENAME, SITE, SAMPLETYPE, YEAR, VENT, SAMPLEID, sep = " ", remove = FALSE)## Warning: Expected 8 pieces. Additional pieces discarded in 481464 rows [109,
## 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125,
## 126, 127, 128, ...].
## Warning: Expected 8 pieces. Missing pieces filled with `NA` in 486972 rows [1,
## 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# View(asv_wtax)
# head(asv_wtax) ## Complete ASV table with full taxonomy names and annotated sample informationBarplots to show total number of sequences and total number of ASVs.
Total number of sequences and ASVs parallel each other. The Axial and Gorda Ridge data were run on the same sequence run, with Mid-Cayman Rise run on a separate MiSeq run - so the average number of sequences (and ASVs) varies between these two runs. A few samples have too few sequences, they will be removed below.
This newest version of PR2 has bacteria and archaea in it. Very, very few were assigned to this. Majority assigned to eukaryotes.
# head(asv_wtax)
plot_grid(
# Total number of ASVs
asv_wtax %>%
filter(value > 0) %>%
ggplot(aes(x = SAMPLENAME)) +
geom_bar(stat = "count", width = 0.9) +
labs(y = "Total ASVs per sample", x = "") +
coord_flip() +
scale_y_continuous(position = "right") +
theme_linedraw() +
facet_grid(DATASET ~ ., scale = "free", space = "free") +
theme(axis.text.x = element_text(angle = 0, hjust = 1, vjust = 1),
axis.text.y = element_text(angle = 0, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black")),
asv_wtax %>%
group_by(SAMPLENAME, SITE, Domain, DATASET) %>%
summarise(SUM_SEQ_DOMAIN = sum(value)) %>%
ggplot(aes(x = SAMPLENAME, y = SUM_SEQ_DOMAIN, fill = Domain)) +
geom_bar(stat = "identity", color = "black", width = 0.9) +
labs(y = "Total sequences per sample", x = "") +
coord_flip() +
scale_y_continuous(position = "right") +
theme_linedraw() +
facet_grid(DATASET ~ ., scale = "free", space = "free") +
theme(axis.text.x = element_text(angle = 0, hjust = 1, vjust = 1),
axis.text.y = element_text(angle = 0, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black"),
legend.position = "right"),
ncol = 2, align = c("hv"), axis = c("lr"))asv_wtax %>% filter(value > 0) %>%
group_by(SAMPLENAME, DATASET, SITE) %>%
summarise(SEQ_SUM = sum(value),
ASV_COUNT = n()) %>%
ungroup() %>%
gt(
groupname_col = c("DATASET", "SITE"),
rowname_col = "SAMPLENAME"
)| SEQ_SUM | ASV_COUNT | |
|---|---|---|
| Axial - Axial | ||
| Axial Background 2015 Deep seawater BSW1500m | 48657 | 494 |
| Axial Plume 2015 Anemone Plume AnemonePlume | 33599 | 474 |
| Axial Vent 2013 Anemone FS891 | 38672 | 449 |
| Axial Vent 2013 Boca FS905 | 190675 | 2174 |
| Axial Vent 2013 Dependable FS900 | 52 | 4 |
| Axial Vent 2013 El Guapo FS896 | 57160 | 804 |
| Axial Vent 2013 Marker113 FS903 | 77473 | 685 |
| Axial Vent 2013 Marker33 FS904 | 51169 | 473 |
| Axial Vent 2013 N3Area FS898 | 53733 | 610 |
| Axial Vent 2013 Skadi FS902 | 43916 | 40 |
| Axial Vent 2014 Marker113 FS906 | 33394 | 333 |
| Axial Vent 2014 Marker33 FS908 | 59502 | 771 |
| Axial Vent 2014 Skadi FS910 | 73959 | 990 |
| Axial Vent 2015 Marker113 FS915 | 63634 | 656 |
| Control or blank - control | ||
| control Control 2020 labblank | 3792 | 20 |
| control Control 2020 shipboard | 83841 | 42 |
| GR - GordaRidge | ||
| GordaRidge Background 2019 Deep seawater BSW020 | 5 | 1 |
| GordaRidge Background 2019 Deep seawater BSW056 | 25095 | 498 |
| GordaRidge Background 2019 Near vent BW Plume001 | 104106 | 1206 |
| GordaRidge Background 2019 Shallow seawater BSW081 | 38196 | 572 |
| GordaRidge Plume 2019 Candelabra Plume Plume036 | 57571 | 473 |
| GordaRidge Plume 2019 Mt Edwards Plume Plume096 | 38604 | 587 |
| GordaRidge Vent 2019 Candelabra Vent086 | 57369 | 721 |
| GordaRidge Vent 2019 Candelabra Vent087 | 45802 | 663 |
| GordaRidge Vent 2019 Candelabra Vent088 | 40719 | 631 |
| GordaRidge Vent 2019 Mt Edwards Vent009 | 42591 | 245 |
| GordaRidge Vent 2019 Mt Edwards Vent010 | 57564 | 558 |
| GordaRidge Vent 2019 Mt Edwards Vent011 | 71430 | 665 |
| GordaRidge Vent 2019 Sir Ventsalot Vent105 | 57998 | 456 |
| GordaRidge Vent 2019 Sir Ventsalot Vent106 | 41250 | 654 |
| GordaRidge Vent 2019 Sir Ventsalot Vent107 | 43230 | 590 |
| GordaRidge Vent 2019 Venti Latte Vent039 | 45588 | 486 |
| GordaRidge Vent 2019 Venti Latte Vent040 | 57186 | 688 |
| GordaRidge Vent 2019 Venti Latte Vent041 | 64680 | 835 |
| Axial - Laboratory | ||
| Laboratory Control 2019 Control Control | 2809 | 23 |
| GR - Laboratory | ||
| Laboratory Control 2019 Control Control | 92426 | 201 |
| MCR - Piccard | ||
| Piccard Background 2020 BSW | 139563 | 646 |
| Piccard Plume 2020 Plume | 123391 | 333 |
| Piccard Vent 2020 LotsOShrimp | 204065 | 687 |
| Piccard Vent 2020 Shrimpocalypse | 146104 | 706 |
| MCR - VonDamm | ||
| VonDamm Background 2020 BSW | 119010 | 864 |
| VonDamm Plume 2020 Plume | 151501 | 1165 |
| VonDamm Vent 2020 ArrowLoop | 115756 | 932 |
| VonDamm Vent 2020 MustardStand | 160115 | 56 |
| VonDamm Vent 2020 OldManTree | 144925 | 600 |
| VonDamm Vent 2020 Rav2 | 286292 | 1792 |
| VonDamm Vent 2020 ShrimpHole | 164954 | 961 |
| VonDamm Vent 2020 WhiteCastle | 177663 | 34 |
| VonDamm Vent 2020 X18 | 169301 | 638 |
After removing contaminate ASVs below, I will set threshold of 10,000 sequences- if a sample has fewer than this, chuck it.
Import sample description text file, import as phyloseq library, and remove potential contaminate ASVs and sequences. Catalog total number of ASVs and sequences removed from analysis.
# library(decontam); library(phyloseq)tax_matrix <- merged_tax %>%
select(FeatureID = `Feature ID`, Taxon) %>%
separate(Taxon, c("Domain", "Supergroup",
"Phylum", "Class", "Order",
"Family", "Genus", "Species"), sep = ";", remove = FALSE) %>%
column_to_rownames(var = "FeatureID") %>%
as.matrix## Warning: Expected 8 pieces. Additional pieces discarded in 8916 rows [1, 2, 5,
## 8, 9, 10, 11, 12, 16, 19, 21, 22, 23, 24, 26, 27, 28, 29, 30, 33, ...].
## Warning: Expected 8 pieces. Missing pieces filled with `NA` in 9018 rows [3, 4,
## 6, 7, 13, 14, 15, 17, 18, 20, 25, 31, 32, 38, 39, 42, 43, 44, 46, 47, ...].
asv_matrix <- merged_asv %>%
select(FeatureID = '#OTU ID', everything()) %>%
column_to_rownames(var = "FeatureID") %>%
as.matrix
# Align row names for each matrix
rownames(tax_matrix) <- row.names(asv_matrix)
# Set rownames of metadata table to SAMPLE information
row.names(metadata) <- metadata$SAMPLE# Import asv and tax matrices
ASV = otu_table(asv_matrix, taxa_are_rows = TRUE)
TAX = tax_table(tax_matrix)
phylo_obj <- phyloseq(ASV, TAX)
# Import metadata as sample data in phyloseq
samplenames <- sample_data(metadata)
# join as phyloseq object
physeq_wnames = merge_phyloseq(phylo_obj, samplenames)
# colnames(ASV)
## Check
# physeq_wnames# When "Control" appears in "Sample_or_Control column, this is a negative control"
sample_data(physeq_wnames)$is.neg <- sample_data(physeq_wnames)$Sample_or_Control == "Control"# ID contaminants using Prevalence information
contam_prev <- isContaminant(physeq_wnames,
method="prevalence",
neg="is.neg",
threshold = 0.5, normalize = TRUE) ## Warning in .is_contaminant(seqtab, conc = conc, neg = neg, method = method, :
## Removed 1 samples with zero total counts (or frequency).
## Warning in .is_contaminant(seqtab, conc = conc, neg = neg, method = method, :
## Removed 1 samples with zero total counts (or frequency).
# Report number of ASVs IDed as contaminants
table(contam_prev$contaminant)##
## FALSE TRUE
## 17880 54
0.5 - this threshold will ID contaminants in all samples that are more prevalent in negative controls than in positive samples.
As of Oct 16 2021: 54 ASVs deemed to be contaminant and will be removed.
# Subset contaminant ASVs
contams <- filter(contam_prev, contaminant == "TRUE")
list_of_contam_asvs <- as.character(row.names(contams))
# length(list_of_contam_asvs)
taxa_contam <- as.data.frame(tax_matrix) %>%
rownames_to_column(var = "FeatureID") %>%
filter(FeatureID %in% list_of_contam_asvs)
# head(taxa_contam)# View(asv_wtax)
asv_wtax_decon <- asv_wtax %>%
filter(!(FeatureID %in% list_of_contam_asvs)) %>%
filter(!(Sample_or_Control == "Control"))
tmp_orig <- (asv_wtax %>% filter(!(Sample_or_Control == "Control")))
# Stats on lost
x <- length(unique(tmp_orig$FeatureID)); x## [1] 17934
y <- length(unique(asv_wtax_decon$FeatureID)); y## [1] 17880
y-x## [1] -54
100*((y-x)/x) # 54 total ASVs lost## [1] -0.301104
a <- sum(tmp_orig$value);a #3.817 million## [1] 3817219
b <- sum(asv_wtax_decon$value);b #3.799 million ## [1] 3799135
100*((b-a)/a)## [1] -0.473748
# Lost 0.47% of sequences from whole dataset.
## Subsample to clean ASVs
asv_wtax_wstats <- asv_wtax %>%
mutate(DECONTAM = case_when(
FeatureID %in% list_of_contam_asvs ~ "FAIL",
TRUE ~ "PASS"
))Started with 17934 ASVs, post-decontamination, we have 17880 (a loss of 54 ASVs).
Data started with 3817219 sequences, after removing 54 ASVs, we have 3799135 total sequences. There was a total loss of 0.473748% of sequences.
plot_grid(asv_wtax_wstats %>%
filter(value > 0) %>%
ggplot(aes(x = SAMPLE, fill = DECONTAM)) +
geom_bar(stat = "count", width = 0.9, color = "black") +
labs(y = "Total ASVs") +
coord_flip() +
theme_linedraw() +
facet_grid(DATASET ~ ., scale = "free", space = "free") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black"),
legend.position = "bottom"),
asv_wtax_wstats %>%
group_by(SAMPLE, SITE, DECONTAM, DATASET) %>%
summarise(SUM_SEQ_DOMAIN = sum(value)) %>%
ggplot(aes(x = SAMPLE, y = SUM_SEQ_DOMAIN, fill = DECONTAM)) +
geom_bar(stat = "identity", color = "black", width = 0.9) +
labs(y = "Total Sequences") +
coord_flip() +
theme_linedraw() +
facet_grid(DATASET ~ ., scale = "free", space = "free") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black"),
legend.position = "bottom"),
ncol = 2) This plot shows the distribution of ASVs and sequences that failed or passed the decontamination step. Most obvious are the control samples that indicated the potentially contaminate ASVs.
# colnames(asv_wtax_wstats)
# unique(asv_wtax_wstats$SAMPLE)
sites <- c("Piccard", "VonDamm", "Axial", "GordaRidge")
asv_insitu <- asv_wtax_wstats %>% filter(Sample_or_Control != "Control") %>%
filter(SITE %in% sites) %>%
filter(!grepl("_expTf_", SAMPLE)) %>%
filter(value > 0) %>%
filter(DECONTAM == "PASS")
# Get quick stats on totals
sum(asv_insitu$value) # 3.8 million sequences## [1] 3799135
length(unique(asv_insitu$FeatureID)) #12,379 ASVs## [1] 12379
Additional sample QC, check replicates, and determine if replicates should be averaged.
plot_grid(asv_insitu %>%
group_by(SAMPLENAME, VENT, DATASET, Domain) %>%
summarise(seqsum_var = sum(value),
asvcount_var = n()) %>%
pivot_longer(ends_with("_var"), names_to = "VARIABLE") %>%
ggplot(aes(x = SAMPLENAME, y = value, fill = Domain)) +
geom_bar(color = "black", stat = "identity", position = "fill") +
facet_grid(VARIABLE ~ DATASET, space = "free", scales = "free") +
scale_y_continuous(expand = c(0,0)) +
theme_linedraw() +
scale_fill_brewer(palette = "Paired") +
theme(strip.background = element_blank(), strip.text = element_text(color = "black"),
axis.text.x = element_text(color = "black", angle = 90, hjust = 1, vjust = 0.5),
legend.position = "bottom"),
asv_insitu %>%
group_by(SAMPLENAME, VENT, DATASET, Domain) %>%
summarise(seqsum_var = sum(value),
asvcount_var = n()) %>%
pivot_longer(ends_with("_var"), names_to = "VARIABLE") %>%
ggplot(aes(x = SAMPLENAME, y = value, fill = Domain)) +
geom_bar(color = "black", stat = "identity", position = "stack") +
facet_grid(VARIABLE ~ DATASET, space = "free_x", scales = "free") +
scale_y_continuous(expand = c(0,0)) +
theme_linedraw() +
scale_fill_brewer(palette = "Paired") +
theme(strip.background = element_blank(), strip.text = element_text(color = "black"),
axis.text.x = element_text(color = "black", angle = 90, hjust = 1, vjust = 0.5),
legend.position = "bottom"),
ncol = 2)asv_insitu %>%
filter(Domain == "Eukaryota") %>%
# unite(SampleIdentifier, VENT, SAMPLETYPE, sep = " ", remove = FALSE) %>%
mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>%
group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
VENT, SITE, SAMPLETYPE, YEAR, DATASET) %>%
summarise(SEQ_AVG_REP = mean(value)) %>%
ungroup() %>%
group_by(SITE, SAMPLETYPE, VENT, Supergroup) %>%
summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>%
ggplot(aes(x = VENT, y = SEQ_SUM, fill = Supergroup)) +
geom_bar(stat = "identity", position = "stack", color = "black", width = 0.9) +
facet_grid(. ~ SITE + SAMPLETYPE, scale = "free", space = "free") +
theme_linedraw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black")) +
scale_y_continuous(expand = c(0,0)) +
scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c",
"#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45",
"#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc",
"#807dba", "#54278f", "#bdbdbd", "black")) Repeat taxonomy barplot, but with relative abundance
asv_insitu %>%
filter(Domain == "Eukaryota") %>%
# unite(SampleIdentifier, VENT, SAMPLETYPE, sep = " ", remove = FALSE) %>%
mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>%
group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
VENT, SITE, SAMPLETYPE, YEAR, DATASET) %>%
summarise(SEQ_AVG_REP = mean(value)) %>%
ungroup() %>%
group_by(SITE, SAMPLETYPE, VENT, Supergroup) %>%
summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>%
ggplot(aes(x = VENT, y = SEQ_SUM, fill = Supergroup)) +
geom_bar(stat = "identity", position = "fill", color = "black", width = 0.9) +
facet_grid(. ~ SITE + SAMPLETYPE, scale = "free", space = "free") +
theme_linedraw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black")) +
scale_y_continuous(expand = c(0,0)) +
scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c",
"#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45",
"#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc",
"#807dba", "#54278f", "#bdbdbd", "black"))Filter samples so that the total number of sequences is greater than 20,000 sequences.
# head(asv_insitu)
# unique(asv_insitu$Sample_or_Control)
# hist(asv_insitu$value)
tmp <- (asv_insitu %>%
group_by(SAMPLE, SAMPLENAME) %>%
summarise(SUM = sum(value)) %>%
filter(SUM < 20000))
toofew <- as.character(unique(tmp$SAMPLE))
toofew## [1] "Axial_Dependable_FS900_2013"
## [2] "GordaRidge_BSW020_sterivex_2019_REPa"
Samples: Axial_Dependable_FS900_2013 and GordaRidge_BSW020_sterivex_2019_REPa removed due to too few sequences.
Final table reporting total sequences and ASVs for each sample.
asv_insitu_qc <- asv_insitu %>%
filter(!(SAMPLE %in% toofew)) %>%
filter(value > 0)
asv_insitu_qc %>%
group_by(SAMPLEID, VENT, DATASET, SITE, SAMPLETYPE, YEAR) %>%
summarise(SEQ_SUM = sum(value),
ASV_COUNT = n()) %>%
ungroup() %>%
gt(
groupname_col = c("DATASET", "SITE", "YEAR"),
rowname_col = "SAMPLEID"
) %>%
tab_header(title = "Final sequence & ASV count")| Final sequence & ASV count | ||||
|---|---|---|---|---|
| VENT | SAMPLETYPE | SEQ_SUM | ASV_COUNT | |
| MCR - VonDamm - 2020 | ||||
| ArrowLoop | Vent | 115543 | 929 | |
| BSW | Background | 118859 | 860 | |
| MustardStand | Vent | 153337 | 53 | |
| OldManTree | Vent | 144857 | 598 | |
| Plume | Plume | 150840 | 1163 | |
| Rav2 | Vent | 286221 | 1785 | |
| ShrimpHole | Vent | 164840 | 960 | |
| WhiteCastle | Vent | 171000 | 32 | |
| X18 | Vent | 169273 | 637 | |
| MCR - Piccard - 2020 | ||||
| BSW | Background | 138994 | 644 | |
| LotsOShrimp | Vent | 204055 | 686 | |
| Plume | Plume | 121820 | 332 | |
| Shrimpocalypse | Vent | 146098 | 705 | |
| Axial - Axial - 2015 | ||||
| AnemonePlume | Anemone Plume | Plume | 33599 | 474 |
| BSW1500m | Deep seawater | Background | 48604 | 492 |
| FS915 | Marker113 | Vent | 63629 | 655 |
| GR - GordaRidge - 2019 | ||||
| BSW056 | Deep seawater | Background | 25095 | 498 |
| BSW081 | Shallow seawater | Background | 38180 | 571 |
| Plume001 | Near vent BW | Background | 104038 | 1201 |
| Plume036 | Candelabra Plume | Plume | 57514 | 472 |
| Plume096 | Mt Edwards Plume | Plume | 38211 | 582 |
| Vent009 | Mt Edwards | Vent | 42591 | 245 |
| Vent010 | Mt Edwards | Vent | 57564 | 558 |
| Vent011 | Mt Edwards | Vent | 71398 | 664 |
| Vent039 | Venti Latte | Vent | 45588 | 486 |
| Vent040 | Venti Latte | Vent | 57186 | 688 |
| Vent041 | Venti Latte | Vent | 64666 | 833 |
| Vent086 | Candelabra | Vent | 57357 | 720 |
| Vent087 | Candelabra | Vent | 45780 | 661 |
| Vent088 | Candelabra | Vent | 40699 | 630 |
| Vent105 | Sir Ventsalot | Vent | 57955 | 455 |
| Vent106 | Sir Ventsalot | Vent | 41174 | 652 |
| Vent107 | Sir Ventsalot | Vent | 43170 | 587 |
| Axial - Axial - 2013 | ||||
| FS891 | Anemone | Vent | 38672 | 449 |
| FS896 | El Guapo | Vent | 57160 | 804 |
| FS898 | N3Area | Vent | 53609 | 608 |
| FS902 | Skadi | Vent | 43916 | 40 |
| FS903 | Marker113 | Vent | 77470 | 684 |
| FS904 | Marker33 | Vent | 51169 | 473 |
| FS905 | Boca | Vent | 190524 | 2172 |
| Axial - Axial - 2014 | ||||
| FS906 | Marker113 | Vent | 33394 | 333 |
| FS908 | Marker33 | Vent | 59502 | 771 |
| FS910 | Skadi | Vent | 73927 | 988 |
Set up analysis to classify each ASV based on distribution
head(asv_insitu_qc)## # A tibble: 6 × 35
## FeatureID SAMPLE value Taxon Domain Supergroup Phylum Class Order Family Genus
## <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 00056209… Gorda… 8 Euka… Eukar… Stramenop… Sagen… <NA> <NA> <NA> <NA>
## 2 00056209… Gorda… 13 Euka… Eukar… Stramenop… Sagen… <NA> <NA> <NA> <NA>
## 3 00096455… Gorda… 91 Euka… Eukar… Rhizaria Radio… Acan… <NA> <NA> <NA>
## 4 000ee377… Axial… 282 Euka… Eukar… Alveolata Cilio… Nass… Nass… Disco… NASS…
## 5 000ee377… Axial… 32 Euka… Eukar… Alveolata Cilio… Nass… Nass… Disco… NASS…
## 6 00165708… Gorda… 1 Euka… Eukar… Stramenop… Ochro… Pela… Pela… Pelag… Pela…
## # … with 24 more variables: Species <chr>, Consensus <dbl>, SAMPLENAME <chr>,
## # VENT <chr>, COORDINATES <chr>, SITE <chr>, Sample_or_Control <chr>,
## # SAMPLEID <chr>, DEPTH <chr>, SAMPLETYPE <chr>, YEAR <int>, TEMP <dbl>,
## # pH <dbl>, PercSeawater <dbl>, Mg <dbl>, NO3 <dbl>, H2 <dbl>, H2S <dbl>,
## # CH4 <dbl>, ProkConc <dbl>, Sample_actual <chr>, Type <chr>, DATASET <chr>,
## # DECONTAM <chr>
# head(insitu_wide)
# unique(asv_insitu_qc$SAMPLETYPE)
# unique(asv_insitu_qc$SITE)
tax_asv_id <- asv_insitu_qc %>%
filter(value > 0) %>% #remove zero values
select(FeatureID, SITE, SAMPLETYPE) %>% # isolate only ASVs that are PRESENT at a site and sampletype
distinct() %>% #unique this, as presense = present in at least 1 rep (where applicable)
unite(sample_id, SITE, SAMPLETYPE, sep = "_") %>%
# select(-SITE) %>%
# distinct() %>%
add_column(present = 1) %>%
pivot_wider(names_from = sample_id, values_from = present, values_fill = 0) %>%
rowwise() %>%
mutate_at(vars(FeatureID), factor)Is an ASV present only at the vent site? plume? or background? What about background and plume only?
library(purrr)
any_cols <- function(tax_asv_id) reduce(tax_asv_id, `|`)
asv_class <- tax_asv_id %>%
mutate(vent = ifelse(any_cols(across(contains("_Vent"), ~ . > 0)), "VENT", ""),
plume= ifelse(any_cols(across(contains("_Plume"), ~ . > 0)), "PLUME", ""),
bsw = ifelse(any_cols(across(contains("_Background"), ~ . > 0)), "BSW", ""),
) %>%
unite(class_tmp, vent, plume, bsw, sep = "_", na.rm = TRUE) %>%
mutate(CLASS = case_when(
class_tmp == "VENT__" ~ "Vent only",
class_tmp == "_PLUME_" ~ "Plume only",
class_tmp == "__BSW" ~ "Background only",
class_tmp == "VENT__BSW" ~ "Vent & background",
class_tmp == "VENT_PLUME_BSW" ~ "Vent, plume, & background",
class_tmp == "VENT_PLUME_" ~ "Vent & plume",
class_tmp == "_PLUME_BSW" ~ "Plume & background"
)) %>%
select(FeatureID, CLASS) %>% distinct()
colnames(tax_asv_id)## [1] "FeatureID" "GordaRidge_Vent" "GordaRidge_Background"
## [4] "Axial_Vent" "VonDamm_Vent" "GordaRidge_Plume"
## [7] "VonDamm_Plume" "Piccard_Vent" "Piccard_Background"
## [10] "Axial_Plume" "VonDamm_Background" "Axial_Background"
## [13] "Piccard_Plume"
Binary data frame with 1 indicating presence of ASV (rows) in a given sample (columns)
# head(tax_asv_id)Depending on prevalence of ASV, assign groupings of location.
asv_class_SITE <- tax_asv_id %>%
mutate(mcr = ifelse(any_cols(across(contains("Piccard") | contains("VonDamm"), ~ . > 0)), "MCR", ""),
axial = ifelse(any_cols(across(contains("Axial"), ~ . > 0)), "AxS", ""),
gr = ifelse(any_cols(across(contains("Gorda"), ~ . > 0)), "GR", "")
) %>%
unite(class_tmp, mcr, axial, gr, sep = "_", na.rm = TRUE) %>%
mutate(SITE_CLASS = case_when(
class_tmp == "__GR" ~ "Gorda Ridge only",
class_tmp == "_AxS_" ~ "Axial only",
class_tmp == "_AxS_GR" ~ "Axial & Gorda Ridge",
class_tmp == "MCR__" ~ "Mid-Cayman Rise",
class_tmp == "MCR__GR" ~ "Mid-Cayman Rise & Gorda Ridge",
class_tmp == "MCR_AxS_" ~ "Mid-Cayman Rise & Axial",
class_tmp == "MCR_AxS_GR" ~ "All sites"
)) %>%
select(FeatureID, SITE_CLASS) %>% distinct()Combine together with original ASV table
insitu_asv_wClass <- asv_insitu_qc %>%
left_join(asv_class) %>%
left_join(asv_class_SITE)
# head(insitu_asv_wClass)Visualize the total number of ASVs in background, plume, versus background.
# head(asv_insitu_qc)
# svg("bubbles.svg", h = 4, w = 8)
asv_insitu_qc %>%
select(DATASET, FeatureID, SAMPLETYPE) %>%
group_by(DATASET, SAMPLETYPE) %>%
summarise(COUNT = n_distinct(FeatureID)) %>%
ggplot(aes(x = DATASET, y = SAMPLETYPE, fill = SAMPLETYPE)) +
geom_point(aes(size = COUNT), shape = 21, color = "black") +
scale_size_continuous(range = c(4,20)) +
scale_fill_viridis_d(option = "B") +
theme_void() +
theme(legend.position = "right",
axis.text = element_text(color = "black"))# dev.off()Bubble plot reporting the total number of ASVs found in the vent, plume, versus background. At each site, the vent protist population had a higher total number of ASVs.
Repeat visualization by ASV distribution category.
# head(insitu_asv_wClass)
insitu_asv_wClass %>%
select(DATASET, FeatureID, SAMPLETYPE, CLASS) %>%
group_by(DATASET, SAMPLETYPE, CLASS) %>%
summarise(COUNT = n_distinct(FeatureID)) %>%
ggplot(aes(x = DATASET, y = SAMPLETYPE, fill = SAMPLETYPE)) +
geom_point(aes(size = COUNT), shape = 21, color = "black") +
scale_size_continuous(range = c(4,20)) +
scale_fill_viridis_d(option = "B") +
theme_void() +
theme(legend.position = "right",
axis.text.x = element_text(color = "black"),
axis.title.y = element_blank()) +
facet_grid(SAMPLETYPE + CLASS ~ ., scales = "free", space = "free") +
labs(x = "", y = "", title = "Total number of ASVs by distribution & sample type") Repeated bubble plot reports the total number of ASVs in the vent, plume, and background - but now further separated by distribution (i.e., if an ASV was found only in the vent and plume = “Vent & plume”). The largest portion of ASVs were found only at the vent sites (Vent only).
Categories for ASV distribution:
unique(insitu_asv_wClass$CLASS)## [1] "Vent only" "Background only"
## [3] "Vent & background" "Vent, plume, & background"
## [5] "Plume only" "Vent & plume"
## [7] "Plume & background"
unique(insitu_asv_wClass$SITE_CLASS)## [1] "Gorda Ridge only" "Axial only"
## [3] "Mid-Cayman Rise" "Mid-Cayman Rise & Axial"
## [5] "Axial & Gorda Ridge" "Mid-Cayman Rise & Gorda Ridge"
## [7] "All sites"
Checkpoint to save working dataframes.
save(asv_insitu, asv_insitu_qc, insitu_asv_wClass, file = "asv-tables-processed-18102021.RData")To explore microbial eukaryotic community diversity at all three sites, below functions have been written to pass 18S data for each site through the same analysis. This will be done for all sites together and for them individually.
Sections below highlight Axial Seamount, Mid-Cayman Rise, and Gorda Ridge data individually.
axial <- c("Axial")
mcr <- c("VonDamm", "Piccard")
gr <- c("GordaRidge")Create a bar plot showing the relative sequence abundance of 18S results to the Supergroup and Phylum level. Function averages across replicates and then sums to the phylum and supergroup level. Bar plot shows the relative sequence abundance.
make_bar_relabun <- function(df, selection){
df_out <- df %>%
filter(SITE %in% selection) %>%
filter(Domain == "Eukaryota") %>%
mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>%
group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
VENT, SITE, SAMPLETYPE, YEAR, DATASET) %>%
summarise(SEQ_AVG_REP = mean(value)) %>%
ungroup()
supergroup <- df_out %>%
group_by(SITE, SAMPLETYPE, VENT, YEAR, Supergroup) %>%
summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>%
ggplot(aes(x = VENT, y = SEQ_SUM, fill = Supergroup)) +
geom_bar(stat = "identity", position = "fill", color = "black", width = 0.9) +
facet_grid(. ~ SITE +YEAR + SAMPLETYPE, scale = "free", space = "free") +
theme_linedraw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black"),
legend.position = "right") +
scale_y_continuous(expand = c(0,0)) +
# scale_fill_brewer(palette = "Set2") +
scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525", "#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
labs(x = "", y = "Relative abundance")
phylum <- df_out %>%
unite(SupergroupPhylum, Supergroup, Phylum, sep = "-") %>%
group_by(SITE, SAMPLETYPE, VENT, YEAR, SupergroupPhylum) %>%
summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>%
ggplot(aes(x = VENT, y = SEQ_SUM, fill = SupergroupPhylum)) +
geom_bar(stat = "identity", position = "fill", color = "black", width = 0.9) +
facet_grid(. ~ SITE +YEAR + SAMPLETYPE, scale = "free", space = "free") +
theme_linedraw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black"),
legend.position = "right") +
scale_y_continuous(expand = c(0,0)) +
scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c",
"#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45",
"#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc",
"#807dba", "#54278f", "#bdbdbd", "black", "white", "#969696", "#525252", "#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c",
"#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45",
"#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc",
"#807dba", "#54278f", "#bdbdbd", "black", "white")) +
labs(x = "", y = "Relative abundance")
supergroup + phylum + patchwork::plot_layout(ncol = 1)
}
# make_bar_relabun(insitu_asv_wClass, axial)Relative abundance plots are misleading, as this tag-sequence data is compositional. To combat this, we can also perform a center log-ratio transformation of the sequence counts. This tile plot (or heat map) will show the relationship from the data mean. Positive values thus demonstrate an increase in the taxa, while negative values illustrate the opposite.
# Not sure if I need this
tax_key <- insitu_asv_wClass %>%
select(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species, CLASS, SITE_CLASS) %>%
distinct()Ahead of the CLR transformation, average across replicates, then sum to the Class level. THEN perform CLR transformation and plot as heat map.
make_clr_trans_tile <- function(df, selection){
df_wide <- df %>%
filter(SITE %in% selection) %>%
# df_wide <- insitu_asv_wClass %>%
# filter(SITE %in% axial) %>%
filter(Domain == "Eukaryota") %>%
# Average across replicates
group_by(FeatureID, SAMPLENAME, VENT, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species) %>%
summarise(AVG = mean(value)) %>%
ungroup() %>%
# Sum to the Order taxonomic classification
unite(SAMPLENAME_2, SAMPLENAME, VENT, sep = "_") %>%
group_by(SAMPLENAME_2, Supergroup, Phylum, Class) %>%
summarise(CLASS_SUM = sum(AVG)) %>%
unite(CLASS, Supergroup, Phylum, Class, sep = " ") %>%
select(CLASS, SAMPLENAME_2, CLASS_SUM) %>%
pivot_wider(names_from = SAMPLENAME_2, values_from = CLASS_SUM, values_fill = 0) %>%
column_to_rownames(var = "CLASS")
## Take wide data frame and CLR transform, pivot to wide, and plot
data.frame(compositions::clr(df_wide)) %>%
rownames_to_column(var = "CLASS") %>%
pivot_longer(cols = starts_with(selection), values_to = "CLR", names_to = "SAMPLENAME_2") %>%
separate(SAMPLENAME_2, c("SAMPLENAME", "VENT"), sep = "_") %>%
separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>%
mutate(VENT = str_replace_all(VENT, "\\.", " ")) %>%
mutate(REGION = case_when(
SITE == "GordaRidge" ~ "Gorda Ridge",
SITE %in% mcr ~ "Mid-Cayman Rise",
SITE == "Axial" ~ "Axial")) %>%
mutate(VENTNAME = case_when(
REGION == "Gorda Ridge" ~ VENT,
REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
)) %>% select(-Sample_tmp) %>%
unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>%
separate(CLASS, c("Supergroup", "Phylum", "Class"), sep = " ", remove = FALSE) %>%
ggplot(aes(x = SAMPLE, y = Class, fill = CLR)) +
geom_tile(color = "#252525") +
theme(legend.position = "right",
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black",size = 8),
axis.text.y = element_text(color = "black", size = 8),
strip.background = element_blank(),
strip.text.y = element_text(hjust = 0, vjust = 0.5, angle = 0),
# strip.text.x = element_blank(),
legend.title = element_blank()) +
labs(x = "", y = "") +
scale_fill_gradient2(low = "#4575b4", mid = "white", high = "#d73027", na.value = "grey50") +
facet_grid(Supergroup + Phylum ~ SAMPLETYPE, space = "free", scales = "free")
}Similar to aove, the first step in this function transforms data using CLR (to ASV level though). First plot will show eigen values (scree plot to determine if 2 vs. 3 dimensions is best for data). Then function extracts data points and creates PCA plot.
make_pca <- function(df, selection){
df_wide_asv <- df %>%
# df_wide_asv <- insitu_asv_wClass %>%
filter(SITE %in% selection) %>%
# filter(SITE %in% axial) %>%
filter(Domain == "Eukaryota") %>%
# Average across replicates
group_by(FeatureID, SAMPLENAME, VENT) %>%
summarise(AVG = mean(value)) %>%
ungroup() %>%
separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>%
mutate(REGION = case_when(
SITE == "GordaRidge" ~ "Gorda Ridge",
SITE %in% mcr ~ "Mid-Cayman Rise",
SITE == "Axial" ~ "Axial")) %>%
mutate(VENTNAME = case_when(
REGION == "Gorda Ridge" ~ VENT,
REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
)) %>% select(-Sample_tmp) %>%
unite(SAMPLE, SAMPLETYPE, REGION, VENTNAME, sep = "_", remove = FALSE) %>%
group_by(FeatureID, SAMPLE) %>%
summarise(SUM = sum(AVG)) %>%
pivot_wider(names_from = FeatureID, values_from = SUM, values_fill = 0) %>%
column_to_rownames(var = "SAMPLE")
# look at eigenvalues
pca_lr <- prcomp(data.frame(compositions::clr(df_wide_asv)))
variance_lr <- (pca_lr$sdev^2)/sum(pca_lr$sdev^2)
## View bar plot
barplot(variance_lr, main = "Log-Ratio PCA Screeplot", xlab = "PC Axis", ylab = "% Variance",
cex.names = 1.5, cex.axis = 1.5, cex.lab = 1.5, cex.main = 1.5)
## Extract PCR points
data.frame(pca_lr$x, SAMPLE = rownames(pca_lr$x)) %>%
separate(SAMPLE, c("SAMPLETYPE", "REGION", "VENTNAME"), sep = "_", remove = FALSE) %>%
## Generate PCA plot
ggplot(aes(x = PC1, y = PC2, shape = SAMPLETYPE, fill = VENTNAME)) +
geom_hline(yintercept = 0) + geom_vline(xintercept = 0, color = "#525252") +
geom_point(size=4, stroke = 1, aes(fill = VENTNAME)) +
ylab(paste0('PC2 ',round(variance_lr[2]*100,2),'%')) +
xlab(paste0('PC1 ',round(variance_lr[1]*100,2),'%')) +
scale_shape_manual(values = c(21, 23, 24)) +
# scale_fill_manual(values = fill_color) +
# scale_color_manual(values = color_color) +
theme_bw() +
theme(axis.text = element_text(color="black", size=12),
legend.title = element_blank(),
axis.title = element_text(color="black", size=14),
legend.text = element_text(color = "black", size = 14),
plot.margin = margin(2, 1, 2, 1, "cm")) +
guides(fill = guide_legend(override.aes = list(shape = 21) ),
shape = guide_legend(override.aes = list(fill = "black")))
}From complete dataset, average across replicates, then sum the total number of ASVs in each sample. Then plot a data point for total number of ASVs (ASV richness) by sample type - where sample type represents the vent, plume, vs. background. Box plots show the median and range.
make_asv_rich <- function(df, selection){
df %>%
filter(SITE %in% selection) %>%
filter(Domain == "Eukaryota") %>%
# Average across replicates
group_by(FeatureID, SAMPLENAME, VENT, Supergroup) %>%
summarise(AVG = mean(value)) %>%
separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>%
mutate(REGION = case_when(
SITE == "GordaRidge" ~ "Gorda Ridge",
SITE %in% mcr ~ "Mid-Cayman Rise",
SITE == "Axial" ~ "Axial")) %>%
mutate(VENTNAME = case_when(
REGION == "Gorda Ridge" ~ VENT,
REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
)) %>% select(-Sample_tmp) %>%
unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>%
ungroup() %>%
group_by(REGION, SAMPLE, SAMPLETYPE) %>%
summarise(NUM_ASV = n()) %>%
ggplot(aes(x = SAMPLETYPE, y = NUM_ASV, shape = SAMPLETYPE)) +
geom_boxplot(aes(group = SAMPLETYPE), alpha = 0.8, width = 0.4) +
geom_jitter(size=2, aes(fill = REGION)) +
scale_shape_manual(values = c(21, 23, 24)) +
theme_bw() +
theme(axis.text = element_text(color="black", size=12),
legend.title = element_blank(),
axis.title = element_text(color="black", size=14),
legend.text = element_text(color = "black", size = 14)) +
guides(fill = guide_legend(override.aes = list(shape = 21) ),
shape = guide_legend(override.aes = list(fill = "black") ) ) +
labs(x = "", y = "Total number of ASVs")
}Bar plot (colors correspond to Supergroup) represents the number of ASVs shared or unique to each sample. Combination matrix below bars shows which samples are considered for the bar plot.
make_upset_plot <- function(df, selection){
df %>%
filter(SITE %in% selection) %>%
filter(Domain == "Eukaryota") %>%
# Average across replicates
group_by(FeatureID, SAMPLENAME, VENT, Supergroup) %>%
summarise(AVG = mean(value)) %>%
separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>%
mutate(REGION = case_when(
SITE == "GordaRidge" ~ "Gorda Ridge",
SITE %in% mcr ~ "Mid-Cayman Rise",
SITE == "Axial" ~ "Axial")) %>%
mutate(VENTNAME = case_when(
REGION == "Gorda Ridge" ~ VENT,
REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
)) %>% select(-Sample_tmp) %>%
unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>%
distinct(FeatureID, Supergroup, AVG, SAMPLE, .keep_all = TRUE) %>%
group_by(FeatureID, Supergroup) %>%
summarise(SAMPLE = list(SAMPLE)) %>%
ggplot(aes(x = SAMPLE)) +
geom_bar(color = "black", width = 0.5, aes(fill = Supergroup)) +
scale_x_upset(n_intersections = 35) +
scale_y_continuous(expand = c(0,0)) +
labs(x = "", y = "Shared ASVs") +
theme_linedraw() +
theme(axis.text = element_text(color="black", size=10),
axis.title = element_text(color="black", size=10),
legend.text = element_text(color = "black", size = 10),
plot.margin = margin(1, 1, 1, 5, "cm")) +
scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c",
"#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45",
"#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc",
"#807dba", "#54278f", "#bdbdbd", "black"))
}make_bar_relabun(insitu_asv_wClass, axial)Axial Seamount samples from archived material - span 2013, 2014, and 2015. First, the background and plume (from 2015 only, and from plume associated with the Anemone vent) are different from the vent samples - overwhelmingly stramneopile and rhizaria. For the background and plume, the stramenopiles appear to be associated with ochrophyta or opalozoa. For the plume, the rhizaria population was associated with cercozoa, while the background seawater was identified as belonging to radiolaria.
The major difference between the background/plume and vent sites was the higher relative sequence abundance of ciliates and opisthokonta. For the opisthokonta, these are primarily metazoa - and I will need to investigate this further. Exceptions for this include the ‘Dependable’ vent from 2013, which had a completely different composition, and ‘Marker 113’ in 2015, which the opisthokonta sequences were assigned choanoflagellate and fungi.
Further questions to consider
Any geochemical changes to Marker 113 from 2013/2014 to 2015? Could attribute difference of opisthokonta colonization.
Need to get metadata
make_clr_trans_tile(insitu_asv_wClass, axial)## Warning: Expected 4 pieces. Additional pieces discarded in 1872 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Tile plot goes to the Class taxonomic level. Here at Axial, mostly the ciliate class had higher CLR values (more enriched relative to the data mean). Second to ciliates were cercozoa. Also noticing how Marker 113 2013 and 2015 are more similar to each other than 2014?
make_pca(insitu_asv_wClass, axial)## Warning: Expected 4 pieces. Additional pieces discarded in 7716 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
While we only have 1 plume and bsw each for Axial, they are grouping together - away from vents. So that is an expected signature and likely consistent with the other sites. These colors are a little confusing, it does look like Boca is an outlier.
make_asv_rich(insitu_asv_wClass, axial)## Warning: Expected 4 pieces. Additional pieces discarded in 7716 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
We only have 1 sample for background and plume from Axial Seamount. But this shows that the vent sites have varied ASV richness,
make_upset_plot(insitu_asv_wClass, axial)## Warning: Expected 4 pieces. Additional pieces discarded in 7716 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 876 rows containing non-finite values (stat_count).
make_bar_relabun(insitu_asv_wClass, mcr)make_clr_trans_tile(insitu_asv_wClass, mcr)## Warning: Expected 4 pieces. Additional pieces discarded in 1794 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_pca(insitu_asv_wClass, mcr)## Warning: Expected 4 pieces. Additional pieces discarded in 8334 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_asv_rich(insitu_asv_wClass, mcr)## Warning: Expected 4 pieces. Additional pieces discarded in 8334 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_upset_plot(insitu_asv_wClass, mcr)## Warning: Expected 4 pieces. Additional pieces discarded in 8334 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 909 rows containing non-finite values (stat_count).
make_bar_relabun(insitu_asv_wClass, gr)make_clr_trans_tile(insitu_asv_wClass, gr)## Warning: Expected 4 pieces. Additional pieces discarded in 2210 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_pca(insitu_asv_wClass, gr)## Warning: Expected 4 pieces. Additional pieces discarded in 9453 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_asv_rich(insitu_asv_wClass, gr)## Warning: Expected 4 pieces. Additional pieces discarded in 9453 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_upset_plot(insitu_asv_wClass, gr)## Warning: Expected 4 pieces. Additional pieces discarded in 9453 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 925 rows containing non-finite values (stat_count).
all <- c("Axial", "VonDamm", "Piccard", "GordaRidge")
mcr <- c("VonDamm", "Piccard")make_bar_relabun(insitu_asv_wClass, all)make_clr_trans_tile(insitu_asv_wClass, all)## Warning: Expected 4 pieces. Additional pieces discarded in 7095 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_pca(insitu_asv_wClass, all)## Warning: Expected 4 pieces. Additional pieces discarded in 25503 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_asv_rich(insitu_asv_wClass, all)## Warning: Expected 4 pieces. Additional pieces discarded in 25503 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_upset_plot(insitu_asv_wClass, all)## Warning: Expected 4 pieces. Additional pieces discarded in 25503 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 3659 rows containing non-finite values (stat_count).
# head(insitu_asv_wClass)Repeat above plot, but resolve by sample location and sample type.
insitu_asv_wClass %>%
filter(SITE %in% all) %>%
filter(Domain == "Eukaryota") %>%
# Average across replicates
group_by(FeatureID, SAMPLENAME, VENT, Supergroup) %>%
summarise(AVG = mean(value)) %>%
separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>%
mutate(REGION = case_when(
SITE == "GordaRidge" ~ "Gorda Ridge",
SITE %in% mcr ~ "Mid-Cayman Rise",
SITE == "Axial" ~ "Axial")) %>%
mutate(VENTNAME = case_when(
REGION == "Gorda Ridge" ~ VENT,
REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
)) %>% select(-Sample_tmp) %>%
unite(SAMPLE, REGION, SAMPLETYPE, sep = " ", remove = FALSE) %>%
group_by(FeatureID, Supergroup, SAMPLE) %>%
summarise(SUM = sum(AVG)) %>%
ungroup() %>%
distinct(FeatureID, Supergroup, SUM, SAMPLE, .keep_all = TRUE) %>%
group_by(FeatureID, Supergroup) %>%
summarise(SAMPLE = list(SAMPLE)) %>%
ggplot(aes(x = SAMPLE)) +
geom_bar(color = "black", width = 0.5, aes(fill = Supergroup)) +
scale_x_upset(n_intersections = 35) +
scale_y_continuous(expand = c(0,0)) +
labs(x = "", y = "Shared ASVs") +
theme_linedraw() +
theme(axis.text = element_text(color="black", size=10),
axis.title = element_text(color="black", size=10),
legend.text = element_text(color = "black", size = 10),
plot.margin = margin(1, 1, 1, 5, "cm")) +
scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c",
"#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45",
"#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc",
"#807dba", "#54278f", "#bdbdbd", "black"))## Warning: Expected 4 pieces. Additional pieces discarded in 25503 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 828 rows containing non-finite values (stat_count).
Observations regarding above plot: - Axial and Gorda Ridge vent sites have more shared ASVs than any other pairwise comparison. After this, there were also many ASVs shared throughout MCR (vent, plume, + background). May be a reflection of sample size, as MCR had more vent sites - a small subset of ASVs were found at all vent sites or all samples. - ASVs within the vents had much higher unique # of ASVs (not shared with another habitat type) than any other sample type/location (furtherest left bars).
# head(insitu_asv_wClass) # from above, where I've classified each ASV by site and occurence in sample type
unique(insitu_asv_wClass$CLASS)## [1] "Vent only" "Background only"
## [3] "Vent & background" "Vent, plume, & background"
## [5] "Plume only" "Vent & plume"
## [7] "Plume & background"
unique(insitu_asv_wClass$SITE_CLASS)## [1] "Gorda Ridge only" "Axial only"
## [3] "Mid-Cayman Rise" "Mid-Cayman Rise & Axial"
## [5] "Axial & Gorda Ridge" "Mid-Cayman Rise & Gorda Ridge"
## [7] "All sites"
unique(insitu_asv_wClass$SAMPLETYPE)## [1] "Vent" "Background" "Plume"
What Supergroups are associated with resident vs. endemic? what about to specific sites?
make_bar_bycategory <- function(df, category, position){
CATEGORY <- enquo(category)
df_out <- df %>%
filter(Domain == "Eukaryota") %>%
mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>%
group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
VENT, SITE, SAMPLETYPE, YEAR, DATASET, !!CATEGORY) %>%
summarise(SEQ_AVG_REP = mean(value)) %>%
ungroup()
## Supergroup
supergroup <- df_out %>%
group_by(Supergroup, !!CATEGORY) %>%
summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>%
ggplot(aes(x = !!CATEGORY, y = SEQ_SUM, fill = Supergroup)) +
geom_bar(stat = "identity", position = position, color = "black", width = 0.9) +
# facet_grid(. ~ SITE +YEAR + SAMPLETYPE, scale = "free", space = "free") +
theme_linedraw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black"),
legend.position = "right") +
scale_y_continuous(expand = c(0,0)) +
# scale_fill_brewer(palette = "Set2") +
scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525", "#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
labs(x = "", y = "Relative abundance")
## Phylum
phylum <- df_out %>%
unite(SupergroupPhylum, Supergroup, Phylum, sep = "-") %>%
group_by(SupergroupPhylum, !!CATEGORY) %>%
summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>%
ggplot(aes(x = !!CATEGORY, y = SEQ_SUM, fill = SupergroupPhylum)) +
geom_bar(stat = "identity", position = position, color = "black", width = 0.9) +
# facet_grid(. ~ SITE +YEAR + SAMPLETYPE, scale = "free", space = "free") +
theme_linedraw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
strip.background = element_blank(), strip.text = element_text(color = "black"),
legend.position = "right") +
scale_y_continuous(expand = c(0,0)) +
scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c",
"#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45",
"#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc",
"#807dba", "#54278f", "#bdbdbd", "black", "white", "#969696", "#525252", "#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c",
"#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45",
"#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc",
"#807dba", "#54278f", "#bdbdbd", "black", "white")) +
labs(x = "", y = "Relative abundance")
supergroup + phylum + patchwork::plot_layout(ncol = 1)
}make_bar_bycategory(insitu_asv_wClass, CLASS, "fill")make_bar_bycategory(insitu_asv_wClass, CLASS, "stack")make_bar_bycategory(insitu_asv_wClass, SITE_CLASS, "fill")make_bar_bycategory(insitu_asv_wClass, SITE_CLASS, "stack")head(insitu_asv_wClass)## # A tibble: 6 × 37
## FeatureID SAMPLE value Taxon Domain Supergroup Phylum Class Order Family Genus
## <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 00056209… Gorda… 8 Euka… Eukar… Stramenop… Sagen… <NA> <NA> <NA> <NA>
## 2 00056209… Gorda… 13 Euka… Eukar… Stramenop… Sagen… <NA> <NA> <NA> <NA>
## 3 00096455… Gorda… 91 Euka… Eukar… Rhizaria Radio… Acan… <NA> <NA> <NA>
## 4 000ee377… Axial… 282 Euka… Eukar… Alveolata Cilio… Nass… Nass… Disco… NASS…
## 5 000ee377… Axial… 32 Euka… Eukar… Alveolata Cilio… Nass… Nass… Disco… NASS…
## 6 00165708… Gorda… 1 Euka… Eukar… Stramenop… Ochro… Pela… Pela… Pelag… Pela…
## # … with 26 more variables: Species <chr>, Consensus <dbl>, SAMPLENAME <chr>,
## # VENT <chr>, COORDINATES <chr>, SITE <chr>, Sample_or_Control <chr>,
## # SAMPLEID <chr>, DEPTH <chr>, SAMPLETYPE <chr>, YEAR <int>, TEMP <dbl>,
## # pH <dbl>, PercSeawater <dbl>, Mg <dbl>, NO3 <dbl>, H2 <dbl>, H2S <dbl>,
## # CH4 <dbl>, ProkConc <dbl>, Sample_actual <chr>, Type <chr>, DATASET <chr>,
## # DECONTAM <chr>, CLASS <chr>, SITE_CLASS <chr>
unique(insitu_asv_wClass$CLASS)## [1] "Vent only" "Background only"
## [3] "Vent & background" "Vent, plume, & background"
## [5] "Plume only" "Vent & plume"
## [7] "Plume & background"
unique(insitu_asv_wClass$SITE_CLASS)## [1] "Gorda Ridge only" "Axial only"
## [3] "Mid-Cayman Rise" "Mid-Cayman Rise & Axial"
## [5] "Axial & Gorda Ridge" "Mid-Cayman Rise & Gorda Ridge"
## [7] "All sites"
unique(insitu_asv_wClass$SAMPLETYPE)## [1] "Vent" "Background" "Plume"
# head(insitu_asv_wClass)
insitu_asv_wClass %>%
# filter(SITE %in% selection) %>%
filter(Domain == "Eukaryota") %>%
filter(!is.na(Supergroup)) %>%
# Average across replicates
group_by(FeatureID, SAMPLENAME, VENT, Supergroup, Phylum, CLASS, SITE_CLASS) %>%
summarise(AVG = mean(value)) %>%
separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>%
mutate(REGION = case_when(
SITE == "GordaRidge" ~ "Gorda Ridge",
SITE %in% mcr ~ "Mid-Cayman Rise",
SITE == "Axial" ~ "Axial")) %>%
mutate(VENTNAME = case_when(
REGION == "Gorda Ridge" ~ VENT,
REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
)) %>% select(-Sample_tmp) %>%
unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>%
filter(CLASS == "Vent only") %>%
group_by(Supergroup, SITE_CLASS) %>%
summarise(SEQ_SUM = sum(AVG),
ASV_COUNT = n()) %>%
pivot_longer(cols = c(SEQ_SUM, ASV_COUNT)) %>%
filter(name == "SEQ_SUM") %>%
ggplot(aes(x = SITE_CLASS, y = value, fill = Supergroup)) +
geom_hline(yintercept = 0) +
geom_segment(aes(x = SITE_CLASS, xend = SITE_CLASS,
y = 0, yend = value, color = Supergroup),
lineend = "butt", size = 1) +
geom_point(size = 2, shape = 19, aes(color = Supergroup)) +
scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
scale_color_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
theme_bw() +
facet_grid(. ~ Supergroup, scales = "free") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, color = "black", size = 11),
axis.text.y = element_text(color = "black", size = 12),
panel.spacing.x = unit(0, "lines"),panel.spacing.y = unit(0, "lines"),
panel.border = element_blank(),
panel.grid = element_blank(),
strip.background.x = element_blank(),
strip.text = element_text(size = 11),
legend.position = "none") +
coord_flip() +
labs(x = "", y ="Total sequences", title = "Number of 'vent-only' sequences by Supergroup & location")## Warning: Expected 4 pieces. Additional pieces discarded in 25357 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# ?scale_fill_brewer# head(insitu_asv_wClass)
insitu_asv_wClass %>%
# filter(SITE %in% selection) %>%
filter(Domain == "Eukaryota") %>%
filter(!is.na(Supergroup)) %>%
# Average across replicates
group_by(FeatureID, SAMPLENAME, VENT, Supergroup, Phylum, CLASS, SITE_CLASS) %>%
summarise(AVG = mean(value)) %>%
separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>%
mutate(REGION = case_when(
SITE == "GordaRidge" ~ "Gorda Ridge",
SITE %in% mcr ~ "Mid-Cayman Rise",
SITE == "Axial" ~ "Axial")) %>%
mutate(VENTNAME = case_when(
REGION == "Gorda Ridge" ~ VENT,
REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
)) %>% select(-Sample_tmp) %>%
unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>%
filter(CLASS == "Vent only") %>%
group_by(Supergroup, SITE_CLASS) %>%
summarise(SEQ_SUM = sum(AVG),
ASV_COUNT = n()) %>%
pivot_longer(cols = c(SEQ_SUM, ASV_COUNT)) %>%
filter(name != "SEQ_SUM") %>%
ggplot(aes(x = SITE_CLASS, y = value, fill = Supergroup)) +
geom_hline(yintercept = 0) +
geom_segment(aes(x = SITE_CLASS, xend = SITE_CLASS,
y = 0, yend = value, color = Supergroup),
lineend = "butt", size = 1) +
geom_point(size = 2, shape = 19, aes(color = Supergroup)) +
scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
scale_color_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
theme_bw() +
facet_grid(. ~ Supergroup, scales = "free") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, color = "black", size = 11),
axis.text.y = element_text(color = "black", size = 12),
panel.spacing.x = unit(0, "lines"),panel.spacing.y = unit(0, "lines"),
panel.border = element_blank(),
panel.grid = element_blank(),
strip.background.x = element_blank(),
strip.text = element_text(size = 11),
legend.position = "none") +
coord_flip() +
labs(x = "", y ="Total ASVs", title = "Number of 'vent-only' ASVs by Supergroup & location")## Warning: Expected 4 pieces. Additional pieces discarded in 25357 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# ?scale_fill_brewerImport grazing data as output from previous, plot with bubbles underneath of specific parameters.
load("asv-tables-processed-16102021.RData", verbose = T)## Loading objects:
## asv_insitu
## asv_insitu_qc
## insitu_asv_wClass
# head(asv_insitu_qc)
head(asv_insitu_qc %>% select(SAMPLENAME, TEMP, pH, Mg, ProkConc) %>% distinct())## # A tibble: 6 × 5
## SAMPLENAME TEMP pH Mg ProkConc
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 GordaRidge Vent 2019 Venti Latte Vent039 11 6.4 50.9 111192.
## 2 GordaRidge Vent 2019 Venti Latte Vent040 11 6.4 50.9 111192.
## 3 GordaRidge Background 2019 Shallow seawater BSW081 8.6 NA 51.8 NA
## 4 Axial Vent 2013 Boca FS905 NA NA NA NA
## 5 Axial Vent 2014 Skadi FS910 NA NA NA NA
## 6 GordaRidge Background 2019 Deep seawater BSW056 1.8 7.8 51.8 39100
Plot sequence relative abundance by temperature and prokaryote concentration. These two parameters were chosen because I have the most metadata from them. If a sample was not countable or had no temperature record, it was removed.
asv_insitu_qc %>%
# filter(SITE %in% selection) %>%
filter(!is.na(TEMP)) %>%
filter(!is.na(ProkConc)) %>%
filter(Domain == "Eukaryota") %>%
mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>%
group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
VENT, SITE, SAMPLETYPE, YEAR, DATASET, TEMP, ProkConc) %>%
summarise(SEQ_AVG_REP = mean(value)) %>%
ungroup() %>%
unite(SupergroupPhylum, Supergroup, Phylum, sep = "-") %>%
group_by(SITE, SAMPLETYPE, VENT, YEAR, SupergroupPhylum, TEMP, ProkConc) %>%
summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>%
ggplot(aes(x = ProkConc, y = as.numeric(TEMP), fill = SITE, shape = SAMPLETYPE)) +
geom_point(color = "black", aes(size = SEQ_SUM)) +
scale_size_continuous(range = c(4,9)) +
scale_shape_manual(values = c(21, 23, 24)) +
scale_x_log10() +
facet_wrap(SupergroupPhylum ~., scale = "free") +
theme_linedraw() +
theme(axis.text = element_text(color = "black", size = 12),
strip.background = element_blank(), strip.text = element_text(color = "black"),
legend.position = "right") +
# scale_y_continuous(expand = c(0,0)) +
scale_fill_manual(values = c("#feb24c", "#addd8e", "#de2d26", "#1c9099")) +
guides(fill = guide_legend(override.aes = list(shape = c(21))),
shape = guide_legend(override.aes = list(fill = "black"))) +
labs(x = bquote("Cells "~mL^-1 ~hr^-1), y = "Temperature (C)")Repeat above plot, but with CLR transformed data.
df_wide_tmp <- asv_insitu_qc %>%
filter(!is.na(TEMP)) %>%
filter(!is.na(ProkConc)) %>%
filter(Domain == "Eukaryota") %>%
filter(value > 0) %>%
# Average across replicates
group_by(FeatureID, SAMPLENAME, VENT, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species, TEMP, ProkConc) %>%
summarise(AVG = mean(value)) %>%
ungroup() %>%
# Sum to the Order taxonomic classification
unite(SAMPLENAME_2, SAMPLENAME, VENT, TEMP, ProkConc, sep = "_") %>%
unite(TAX, FeatureID, Supergroup, Phylum, sep = " ") %>%
select(TAX, SAMPLENAME_2, AVG) %>%
pivot_wider(names_from = SAMPLENAME_2, values_from = AVG, values_fill = 0) %>%
column_to_rownames(var = "TAX")
## Take wide data frame and CLR transform, pivot to wide, and plot
clr_long_df <- data.frame(compositions::clr(df_wide_tmp)) %>%
rownames_to_column(var = "TAX") %>%
pivot_longer(cols = starts_with(all), values_to = "CLR", names_to = "SAMPLENAME_2") %>%
separate(SAMPLENAME_2, c("SAMPLENAME", "VENT", "TEMP", "ProkConc"), sep = "_") %>%
separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>%
mutate(VENT = str_replace_all(VENT, "\\.", " ")) %>%
mutate(REGION = case_when(
SITE == "GordaRidge" ~ "Gorda Ridge",
SITE %in% mcr ~ "Mid-Cayman Rise",
SITE == "Axial" ~ "Axial")) %>%
mutate(VENTNAME = case_when(
REGION == "Gorda Ridge" ~ VENT,
REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
)) %>% select(-Sample_tmp) %>%
unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>%
separate(TAX, c("ASVid","Supergroup", "Phylum"), sep = " ", remove = TRUE) %>%
unite(SupergroupPhylum, Supergroup, Phylum, sep = "-")## Warning: Expected 4 pieces. Additional pieces discarded in 173745 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# head(clr_long_df)
## Plot
clr_long_df %>%
filter(SAMPLETYPE == "Vent") %>%
ggplot(aes(x = as.numeric(ProkConc), y = as.numeric(TEMP), fill = CLR, shape = REGION)) +
geom_point(color = "black", size = 3, aes(fill = CLR, shape = REGION)) +
scale_fill_gradient2(low = "#4575b4", mid = "white", high = "#d73027", na.value = "grey50") +
scale_shape_manual(values = c(21, 23, 24, 25)) +
scale_x_log10() +
facet_wrap(SupergroupPhylum ~ ., scale = "free") +
theme_linedraw() +
theme(axis.text = element_text(color = "black", size = 12),
strip.background = element_blank(), strip.text = element_text(color = "black"),
legend.position = "right") +
guides(fill = guide_legend(override.aes = list(shape = c(21))),
shape = guide_legend(override.aes = list(fill = "black"))) +
labs(x = bquote("Cells "~mL^-1 ~hr^-1), y = "Temperature (C)")pending
pending CLR to triangle plot? triangle plot with relative abundance - are there distinct signatures of vent endemics by region?? Are there clusters on the triangle plot??
Subset dataset to create endemic dataset and a vent inclusive dataset.
# head(insitu_asv_wClass)
# unique(insitu_asv_wClass$CLASS)
vent_endemic <- insitu_asv_wClass %>%
filter(CLASS == "Vent only")
# unique(vent_endemic$SAMPLETYPE)
vent_broad <- insitu_asv_wClass %>%
filter(SAMPLETYPE == "Vent")
length(unique(vent_broad$FeatureID))## [1] 10060
length(unique(vent_endemic$FeatureID))## [1] 8109
sum(vent_broad$value)## [1] 2923324
sum(vent_endemic$value)## [1] 1278117
make_bar_relabun(vent_endemic, all)make_bar_relabun(vent_broad, all)make_pca(vent_endemic, all)## Warning: Expected 4 pieces. Additional pieces discarded in 10517 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_pca(vent_broad, all)## Warning: Expected 4 pieces. Additional pieces discarded in 18955 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
make_pca(vent_endemic, axial)## Warning: Expected 4 pieces. Additional pieces discarded in 4421 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Diversity model/estimation and network analysis to be run on HPC.
load("asv-tables-processed-18102021.RData", verbose = T)DivNet package - diversity estimation hypothesis testing from Amy Willis’s group. This will also characterize the uncertainty of the richness estimate. Richness estimation is flawed because of sample depth and processing methods.
library(phyloseq); library(breakaway); library(DivNet)
# Select eukaryotes only and create wide format dataframe
insitu_wide <- asv_insitu_qc %>%
filter(Domain == "Eukaryota") %>%
filter(!grepl("_Plume001_", SAMPLE)) %>% #removing "near vent background", not relevant in other data sets
select(FeatureID, Taxon, SAMPLE, value) %>%
pivot_wider(names_from = SAMPLE, values_from = value, values_fill = 0)
# head(insitu_wide)
insitu_samples <- as.character(colnames(insitu_wide %>% select(-Taxon, -FeatureID)))
# insitu_samplesinsitu_tax_matrix <- insitu_wide %>%
select(FeatureID, Taxon) %>%
separate(Taxon, c("Domain", "Supergroup",
"Phylum", "Class", "Order",
"Family", "Genus", "Species"), sep = ";") %>%
column_to_rownames(var = "FeatureID") %>%
as.matrix
insitu_asv_matrix <- insitu_wide %>%
select(-Taxon) %>%
column_to_rownames(var = "FeatureID") %>%
as.matrix
# Align row names for each matrix
rownames(insitu_tax_matrix) <- row.names(insitu_asv_matrix)
## Extract relevant metadata information
# head(metadata)
metadata_insitu <- metadata %>%
filter(SAMPLE %in% insitu_samples) %>% # from reformatting df above
select(SAMPLE, VENT, SITE, SAMPLETYPE, YEAR) %>%
unite(SAMPLELABEL, VENT, SITE, SAMPLETYPE, YEAR, sep = "_", remove = FALSE) %>%
unite(TYPE_SITE, SITE, SAMPLETYPE, sep = "_", remove = FALSE)
# View(metadata_insitu)# Import asv and tax matrices
ASV = otu_table(insitu_asv_matrix, taxa_are_rows = TRUE)
TAX = tax_table(insitu_tax_matrix)
phylo_obj <- phyloseq(ASV, TAX)
# Import metadata as sample data in phyloseq
samplenames <- sample_data(metadata_insitu)
# View(samplenames)
# join as phyloseq object
physeq_insitu = merge_phyloseq(phylo_obj, samplenames)
## Check
physeq_insitu
# head(tax_matrix[1:2,])Note options to apply divnet to
# Glom tax levels at the Order level, then perform divnet analysis
# order_divnet <- divnet(tax_glom(physeq_insitu, taxrank = "Order"), base = 3)
# order_divnet <- divnet(tax_glom(physeq_insitu, taxrank = "Order"), X = "SAMPLELABEL", base = 3)
# order_divnet_TYPE <- divnet(tax_glom(physeq_insitu, taxrank = "Order"), X = "SAMPLETYPE", base = 3)
# order_divnet_TYPE_SITE <- divnet(tax_glom(physeq_insitu, taxrank = "Order"), X = "TYPE_SITE", base = 3)
# fam_divnet <- divnet(tax_glom(physeq_insitu, taxrank = "Family"), base = 3)# head(metadata_insitu)
# fxn_extract_divet <- function(df){
# df$shannon %>% summary %>%
# pivot_longer(cols = starts_with("estimate"), names_to = "ESTIMATE-shannon", values_to = "Shannon") %>%
# pivot_longer(cols = starts_with("error"), names_to = "ERROR-shannon", values_to = "Shannon-error") %>%
# pivot_longer(cols = starts_with("lower"), names_to = "LOWER-shannon", values_to = "Shannon-lower") %>%
# pivot_longer(cols = starts_with("upper"), names_to = "UPPER-shannon", values_to = "Shannon-upper") %>%
# left_join(df$simpson %>% summary %>%
# pivot_longer(cols = starts_with("estimate"), names_to = "ESTIMATE-simpson", values_to = "Simpson") %>%
# pivot_longer(cols = starts_with("error"), names_to = "ERROR-simpson", values_to = "Simpson-error") %>%
# pivot_longer(cols = starts_with("lower"), names_to = "LOWER-simpson", values_to = "Simpson-lower") %>%
# pivot_longer(cols = starts_with("upper"), names_to = "UPPER-simpson", values_to = "Simpson-upper"),
# by = c("sample_names" = "sample_names")) %>%
# left_join(metadata_insitu %>% rownames_to_column(var = "sample_names")) %>%
# select(-sample_names, -ends_with("-simpson"), -ends_with("-shannon"), -starts_with("model."), -starts_with("name.")) %>%
# distinct()
# }
#
# order_alpha_18s <- fxn_extract_divet(order_divnet)
# order_alpha_TYPE <- fxn_extract_divet(order_divnet_TYPE)
# order_alpha_TYPE_SITE <- fxn_extract_divet(order_divnet_TYPE_SITE)# head(order_alpha_18s)
# plot_grid(order_alpha_18s %>%
# # ggplot(aes(x = VENT, y = Shannon)) +
# ggplot(aes(x = SAMPLETYPE, y = Shannon, group = SAMPLETYPE)) +
# # geom_errorbar(aes(ymin = `Shannon-lower`, ymax = `Shannon-upper`), color = "#525252", width = 0.2) +
# # geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
# # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
# geom_violin(aes(fill = SAMPLETYPE), color = "#525252", alpha = 0.5, width = 0.5, draw_quantiles = c(0.25, 0.5, 0.75)) +
# geom_jitter(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
# scale_fill_brewer(palette = "Set2") +
# theme_linedraw() +
# theme(axis.text.y = element_text(size = 14),
# axis.text.x = element_blank(),
# strip.background = element_blank(),
# strip.text = element_text(color = "black"),
# legend.position = "none",
# axis.ticks.x = element_blank()) +
# labs(x = "", y = "Shannon"),
# order_alpha_18s %>%
# # ggplot(aes(x = VENT, y = Simpson)) +
# ggplot(aes(x = SAMPLETYPE, y = Shannon, group = SAMPLETYPE)) +
# # geom_errorbar(aes(ymin = `Simpson-lower`, ymax = `Simpson-upper`), color = "#525252", width = 0.2) +
# # geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
# # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
# geom_violin(aes(fill = SAMPLETYPE), color = "#525252", alpha = 0.5, width = 0.5, draw_quantiles = c(0.25, 0.5, 0.75)) +
# geom_jitter(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
# scale_fill_brewer(palette = "Set2") +
# theme_linedraw() +
# theme(axis.text.x = element_text(vjust = 1, hjust = 0.5, size = 14),
# axis.text = element_text(size = 14),
# strip.background = element_blank(),
# strip.text = element_blank(),
# legend.title = element_blank(),
# legend.position = "bottom") +
# labs(x = "", y = "Simpson"),
# ncol = 1, axis = c("lrt"), align = c("vh"))Save for presentation
# svg("Shannon-violin-plot.svg",)
# order_alpha_18s %>%
# # ggplot(aes(x = VENT, y = Shannon)) +
# ggplot(aes(x = SAMPLETYPE, y = Shannon, group = SAMPLETYPE)) +
# # geom_errorbar(aes(ymin = `Shannon-lower`, ymax = `Shannon-upper`), color = "#525252", width = 0.2) +
# # geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
# # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
# geom_violin(aes(fill = SAMPLETYPE), color = "#525252", alpha = 0.5, width = 0.5, draw_quantiles = c(0.25, 0.5, 0.75)) +
# geom_jitter(shape = 21, color = "#525252", size = 3, aes(fill = SAMPLETYPE)) +
# # scale_fill_brewer(palette = "Set2") +
# scale_fill_manual(values = c("#1c9099", "#fd8d3c", "#f768a1")) +
# theme_linedraw() +
# theme(axis.text.x = element_text(vjust = 1, hjust = 0.5, size = 14),
# axis.text = element_text(size = 14),
# strip.background = element_blank(),
# strip.text = element_blank(),
# legend.title = element_blank(),
# legend.position = "bottom") +
# labs(x = "", y = "Shannon")
# dev.off()# head(order_alpha_TYPE)
# plot_grid(order_alpha_TYPE %>%
# select(-SAMPLELABEL, -VENT, -SITE, -TYPE_SITE, -YEAR) %>%
# distinct() %>%
# ggplot(aes(x = SAMPLETYPE, y = Shannon)) +
# geom_errorbar(aes(ymin = `Shannon-lower`, ymax = `Shannon-upper`), color = "#525252", width = 0.2) +
# geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
# # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
# scale_fill_brewer(palette = "Set2") +
# theme_linedraw() +
# theme(axis.text.x = element_blank(),
# strip.background = element_blank(),
# strip.text = element_text(color = "black"),
# legend.position = "none",
# axis.ticks = element_blank()) +
# labs(x = "", y = "Shannon"),
# order_alpha_TYPE %>%
# select(-SAMPLELABEL, -VENT, -SITE, -TYPE_SITE, -YEAR) %>%
# distinct() %>%
# ggplot(aes(x = SAMPLETYPE, y = Simpson)) +
# geom_errorbar(aes(ymin = `Simpson-lower`, ymax = `Simpson-upper`), color = "#525252", width = 0.2) +
# geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
# # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
# scale_fill_brewer(palette = "Set2") +
# theme_linedraw() +
# theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
# strip.background = element_blank(),
# strip.text = element_blank(),
# legend.title = element_blank(),
# legend.position = "bottom") +
# labs(x = "", y = "Simpson"),
# ncol = 1, axis = c("lr"), align = c("v"))# head(order_alpha_TYPE_SITE)
# plot_grid(order_alpha_TYPE_SITE %>%
# select(-SAMPLELABEL, -YEAR, -VENT) %>%
# distinct() %>%
# ggplot(aes(x = SAMPLETYPE, y = Shannon)) +
# geom_errorbar(aes(ymin = `Shannon-lower`, ymax = `Shannon-upper`), color = "#525252", width = 0.2) +
# geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
# facet_grid(. ~ SITE, space = "free_x", scales = "free_x") +
# scale_fill_brewer(palette = "Set2") +
# theme_linedraw() +
# theme(axis.text.x = element_blank(),
# strip.background = element_blank(),
# strip.text = element_text(color = "black"),
# legend.position = "none",
# axis.ticks = element_blank()) +
# labs(x = "", y = "Shannon"),
# order_alpha_TYPE_SITE %>%
# select(-SAMPLELABEL, -YEAR, -VENT) %>%
# distinct() %>%
# ggplot(aes(x = SAMPLETYPE, y = Simpson)) +
# geom_errorbar(aes(ymin = `Simpson-lower`, ymax = `Simpson-upper`), color = "#525252", width = 0.2) +
# geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
# facet_grid(. ~ SITE, space = "free_x", scales = "free_x") +
# scale_fill_brewer(palette = "Set2") +
# theme_linedraw() +
# theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
# strip.background = element_blank(),
# strip.text = element_blank(),
# legend.title = element_blank(),
# legend.position = "bottom") +
# labs(x = "", y = "Simpson"),
# ncol = 1, axis = c("lr"), align = c("v"))